Storage APIs

Google Cloud Datalab provides an easy environment for working with your data. This includes data that is being managed within Google Cloud Storage. This notebook introduces some of the APIs that Datalab provides for working with Google Cloud Storage.

You've already seen the use of %%gcs commands in the Storage Commands notebook. These commands are built using the same Storage APIs that are available for your own use.

For context, items or files held in Cloud Storage are called objects. These are immutable once written. They are organized into buckets. Each object has a unique key.

Importing the API

The Datalab APIs are provided in the google.datalab Python library, and the Cloud Storage functionality is contained within the google.datalab.storage module.



In [1]:

    
import google.datalab.storage as storage

First, we will get our project name so we can construct an appropriate path to Cloud Storage. Run this code in your own project:



In [4]:

    
from google.datalab import Context
import random, string

project = Context.default().project_id
suffix = ''.join(random.choice(string.lowercase) for _ in range(5))
sample_bucket_name = project + '-datalab-samples-' + suffix
sample_bucket_path = 'gs://' + sample_bucket_name
sample_bucket_object = sample_bucket_path + '/Hello.txt'

print('Bucket: ' + sample_bucket_path)
print('Object: ' + sample_bucket_object)









    



Bucket: gs://mysampleproject-datalab-samples-abcde
Object: gs://mysampleproject-datalab-samples-abcde/Hello.txt

Buckets

Referencing and Enumerating

A Bucket reference can be created using its name, and then enumerated to list the contained objects. Each object has a unique key.



In [5]:

    
shared_bucket = storage.Bucket('cloud-datalab-samples')
for obj in shared_bucket.objects():
  if obj.key.find('/') < 0:
    print(obj.key)









    



applogs
cars.csv
cars2.csv
hello.txt

Objects can also be filtered while enumerating, since it is likely that a bucket may contain several objects.



In [6]:

    
for obj in shared_bucket.objects(prefix = 'httplogs/', delimiter = '/'):
  print(obj.key)









    



httplogs/logs20140615.csv
httplogs/logs20140616.csv
httplogs/logs20140617.csv
httplogs/logs20140618.csv
httplogs/logs20140619.csv
httplogs/logs20140620.csv
httplogs/logs_sample.csv

Creating



In [7]:

    
sample_bucket = storage.Bucket(sample_bucket_name)
sample_bucket.create()
sample_bucket.exists()









    Out[7]:





True

Objects

Creating, Writing and Reading



In [8]:

    
sample_object = sample_bucket.object('sample.txt')
sample_object.write_stream('Some sample text', 'text/plain')



In [14]:

    
list(sample_bucket.objects())









    



[Google Cloud Storage Object gs://mysampleproject-datalab-samples-abcde/sample.txt]



In [10]:

    
sample_object.metadata.size









    Out[10]:





16



In [11]:

    
sample_text = sample_object.read_stream()
print(sample_text)









    



Some sample text

Deleting



In [12]:

    
sample_object.exists()









    Out[12]:





True



In [13]:

    
sample_object.delete()
sample_bucket.delete()